A Case for Near Memory Computation Inside the Smart Memory Cube
نویسندگان
چکیده
3D integration of solid-state memories and logic, as demonstrated by the Hybrid Memory Cube (HMC), offers major opportunities for revisiting near-memory computation and gives new hope to mitigate the power and performance losses caused by the “memory wall”. In this paper we present the first exploration steps towards design of the Smart Memory Cube (SMC), a new Processor-in-Memory (PIM) architecture that enhances the capabilities of the logic-base (LoB) in HMC. An accurate simulation environment has been developed, along with a full featured software stack. All offloading and dynamic overheads caused by the operating system, cache coherence, and memory management are considered, as well. Benchmarking results demonstrate up to 2X performance improvement in comparison with the host SoC, and around 1.5X against a similar host-side accelerator. Moreover, by scaling down the voltage and frequency of PIM’s processor it is possible to reduce energy by around 70% and 55% in comparison with the host and the accelerator, respectively.
منابع مشابه
A Logic-base Interconnect for Supporting Near Memory Computation in the Hybrid Memory Cube
Hybrid Memory Cube (HMC) has promised to improve bandwidth, power consumption, and density for nextgeneration main memory systems. In addition, 3D integration gives a “second shot” for revisiting near memory computation to fill the gap between the processors and memories. In this paper, we study the concept of “Smart Memory Cube (SMC)”, a fully backward compatible and modular extension to the s...
متن کاملCondensed Cube: An Effective Approach to Reducing Data Cube Size
Pre-computed data cube facilitates OLAP (On-Line Analytical Processing). It is a well-known fact that data cube computation is an expensive operation, which attracts a lot of attention. While most proposed algorithms devoted themselves to optimizing memory management and reducing computation costs, less work addresses one of the fundamental issues: the size of a data cube is huge when a large b...
متن کاملA Message-Passing Distributed Memory Parallel Algorithm for a Dual-Code Thin Layer, Parabolized Navier-Stokes Solver
In this study, the results of parallelization of a 3-D dual code (Thin Layer, Parabolized Navier-Stokes solver) for solving supersonic turbulent flow around body and wing-body combinations are presented. As a serial code, TLNS solver is very time consuming and takes a large part of memory due to the iterative and lengthy computations. Also for complicated geometries, an exceeding number of grid...
متن کاملActive Memory Cube: A processing-in-memory architecture for exascale systems
A processing-in-memory architecture for exascale systems R. Nair S. F. Antao C. Bertolli P. Bose J. R. Brunheroto T. Chen C.-Y. Cher C. H. A. Costa J. Doi C. Evangelinos B. M. Fleischer T. W. Fox D. S. Gallo L. Grinberg J. A. Gunnels A. C. Jacob P. Jacob H. M. Jacobson T. Karkhanis C. Kim J. H. Moreno J. K. O’Brien M. Ohmacht Y. Park D. A. Prener B. S. Rosenburg K. D. Ryu O. Sallenave M. J. Ser...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015